@TASK
sf map
object.geom_sf() to map points,
linestrings and polygonsThis lesson requires the following packages:
if(!require('pacman')) install.packages('pacman')
pacman::p_load_gh("afrimapr/afrilearndata")
pacman::p_load_gh("ropensci/rnaturalearthhires")
pacman::p_load(ggspatial,
ggplot2,
rnaturalearth,
spData,
mapview,
tidyverse)rnaturalearthAs an introduction to plotting maps in R, let’s look at how to draw a simple world map with country borders. The package {rnaturalearth} contains information to map all the countries in the world, among other things.
To obtain this map information, use the ne_countries()
function, with the argument returnclass = "sf".
countries <- ne_countries(returnclass = "sf")The code returns an sf object with the shapes for all
countries. We will soon explain in detail what an “sf” object is. For
now, just note that it is a special kind of data frame for geospatial
data in R:
class(countries)## [1] "sf" "data.frame"
Now, the countries object can be plotted very easily
with the geom_sf function of {ggplot2}:
ggplot(data = countries) +
geom_sf()Wonderful! (Almost too easy!)
To subset to a specific continent, use the
continent argument of ne_countries():
# Countries in South America
south_am <- ne_countries(returnclass = "sf",
continent = "south america")
ggplot(data = south_am) +
geom_sf()The continent argument can accept a vector with multiple
continents:
# Countries in north and south america
north_south_am <- ne_countries(returnclass = "sf",
continent = c("north america", "south america"))
ggplot(data = north_south_am) +
geom_sf()To subset to a specific country or specific
countries, use the country argument:
# Map of Nigeria and Niger
nigeria_niger <- ne_countries(returnclass = "sf",
country = c("nigeria", "niger"))
ggplot(data = nigeria_niger) +
geom_sf()◘ Use ne_countries(), ggplot() and
geom_sf() to plot a single map of the countries in Asia and
Africa
q_asia_africa <- ne_countries(returnclass = "sf",
continent = c("asia", "africa"))
ggplot(data = asia_africa) +
geom_sf()◘ Use ne_countries(), ggplot() and
geom_sf() to plot a single map of the national borders of
China and Indonesia
china_indonesia <- ne_countries(returnclass = "sf",
country = c("china", "indonesia"))
ggplot(data = china_indonesia) +
geom_sf()sf objectsSo far we’ve been passing these sf objects into ggplot
without thinking about their underlying structure. Let’s now look under
the hood to understand sf objects better. In the process,
you will encounter several key concepts you’ll need to understand when
working with geospatial data in R.
First of all, what does the acronym “sf” mean? It stands for Simple Features, which is a set of widely-used standards for storing geospatial information in databases. The details of these standards are beyond the scope of this course; just know that the {sf} R package was written to bring spatial data analysis in R closer towards these Simple Features standards. {sf} largely replaces an older R package for geospatial analysis {sp}, although you may still need to use {sp} once in a while.
Now, what do sf objects look like and how do we work
with them? To answer this, we’ll look at a slice of the
countries object previously defined. Since this
sf object is just a special kind of data frame, we can
manipulate it with standard functions like dplyr::select().
So let’s select just three columns to make the object easier to
observe:
countries %>%
select(name, # country name
pop_est, # estimated population
geometry)## Simple feature collection with 177 features and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
## Geodetic CRS: SOURCECRS
## First 10 features:
## name pop_est geometry
## 0 Afghanistan 28400000 MULTIPOLYGON (((61.21082 35...
## 1 Angola 12799293 MULTIPOLYGON (((16.32653 -5...
## 2 Albania 3639453 MULTIPOLYGON (((20.59025 41...
## 3 United Arab Emirates 4798491 MULTIPOLYGON (((51.57952 24...
## 4 Argentina 40913584 MULTIPOLYGON (((-65.5 -55.2...
## 5 Armenia 2967004 MULTIPOLYGON (((43.58275 41...
## 6 Antarctica 3802 MULTIPOLYGON (((-59.57209 -...
## 7 Fr. S. Antarctic Lands 140 MULTIPOLYGON (((68.935 -48....
## 8 Australia 21262641 MULTIPOLYGON (((145.398 -40...
## 9 Austria 8210281 MULTIPOLYGON (((16.97967 48...
What do we see? The object consists of a 5-line header and a data frame.
sf
headerThe header provides some contextualizing information about the rest of the object. You usually don’t need to pay too much attention to this header, but we will go through it in some detail here, because it provides an opportunity to learn a few key terms in geospatial analysis.
Let’s go line-by-line through this header to see what these terms mean:
The first line of the header tells you the number of
features and fields in the
sf object:
👉Simple feature collection with 177 features and 2 fields👈
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
Geodetic CRS: SOURCECRS
Features are simply the geographical objects
represented by each row of the data frame. In our countries
dataset, each country has its own row; therefore each country is a
feature.
And what are fields? These are the attributes that
pertain to each feature in the data. In our countries
dataset, the fields include “name”, the name of each country, and
“pop_est”, its estimated population. Fields are essentially equivalent
to columns in the data frame, although the “geometry” column does not
count as a field.
Fig: Features and fields of an `sf` object
The spData::nz dataset contains mapping information for
the regions of New Zealand. How many features and fields does the
dataset have?
# Delete the wrong lines and run the correct line:
q_nz_features_fields <- "A. 16 features and 6 fields"
q_nz_features_fields <- "B. 20 features and 5 fields"
q_nz_features_fields <- "C. 5 features and 4 fields"The second line of the header gives you the type of geometry in the
sf object:
Simple feature collection with 177 features and 2 fields
👉Geometry type: MULTIPOLYGON👈
Dimension: XY
Bounding box: xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
Geodetic CRS: SOURCECRS
“Geometry” is essentially a synonym for “shape”. There are three main geometry types: points, lines and polygons. Each of these has its respective “multi” version: multipoints, multilines and multipolygons.
The figure below outlines these main types of geometries.
Fig: Geometry types and example maps for each. Points, lines (or linestrings) and polygons are the most common geometries you will encounter.
In the case of the countries object, the geometry type
is stated as “MULTIPOLYGON”. What does this mean? Some countries are
made of just one polygon (that is, a single bounded shape), but others
are composed of multiple polygons. For example, Indonesia is composed of
several islands, each of which is its own polygon. But each
sf object can have only one geometry type, so all countries
are “forced” to have the MULTIPOLYGON geometry type.
So now you’re familiar with what “polygon” geometries look like. Let’s then look at geospatial datasets with “point” and “linestring” geometries.
Points
As described in the image above, points are simply individual x,y locations. They could represent the location of a case (e.g. a patient home), small objects like wells or trees, or the center points (called “centroids”) of larger objects like buildings or entire cities. Let’s see one example now.
The ne_download() function from {rnaturalearth} can be
used to obtain geospatial information for mapping some major world
ports:
world_ports <-
ne_download(scale = 10,
type = "ports",
returnclass = "sf") ## OGR data source with driver: ESRI Shapefile
## Source: "/private/var/folders/vr/shb6ffvj2rl61kh7qqczhrgh0000gn/T/Rtmp3w83vy", layer: "ne_10m_ports"
## with 1081 features
## It has 6 fields
## Integer64 fields read as strings: ne_id
(Do consult the {rnaturalearth} documentation at docs.ropensci.org/rnaturalearth
to learn how to use ne_download() to download many other
interesting geospatial datasets.)
When we print this world_ports object, we can see the
“POINT” geometry is referenced on line 2 of the header, and also in each
row of the geometry column:
world_ports %>%
select(name, geometry) %>%
head()## Simple feature collection with 6 features and 1 field
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: -69.92356 ymin: -38.89444 xmax: -58.95141 ymax: 12.4375
## Geodetic CRS: WGS 84
## name geometry
## 1 Sint Nicolaas POINT (-69.92356 12.4375)
## 2 Campana POINT (-58.95141 -34.15333)
## 3 Zarate POINT (-59.00495 -34.09889)
## 4 Puerto Belgrano/Bahia Blanca POINT (-62.10088 -38.89444)
## 5 Puerto Galvan/Bahia Blanca POINT (-62.30053 -38.78306)
## 6 Ingeniero White/Bahia Blanca POINT (-62.25989 -38.79194)
Each point has two coordinates—in this case longitude and latitude—that describe its position. For the first feature, “Sint Nicolaas”, the point coordinates are -69.92356 and 12.4375. The first number, -69.9…, refers to longitude 69.9° West, and 12.4… refers to latitude 12.4° North.
As before, we can plot this world_ports object directly
with ggplot() and geom_sf(), to map the
locations of all points:
ggplot(data = world_ports) +
geom_sf()Very cool! You have now plotted the point geometries representing the major world ports.
Lines
Lines or linestrings consist of two or more interconnected points. They are used to represent objects like roads, rivers or utility lines. Let’s look at one example.
The ne_download() function from {rnaturalearth} can be
used to obtain geospatial information for mapping some major world
rivers:
world_rivers <-
ne_download(scale = 50,
category = "physical",
type = "rivers_lake_centerlines",
returnclass = "sf") ## OGR data source with driver: ESRI Shapefile
## Source: "/private/var/folders/vr/shb6ffvj2rl61kh7qqczhrgh0000gn/T/Rtmp3w83vy", layer: "ne_50m_rivers_lake_centerlines"
## with 478 features
## It has 36 fields
## Integer64 fields read as strings: ne_id
When we print this world_rivers object, we can see the
“MULTILINESTRING” geometry referenced on line 2 of the header, and in
each row of the geometry column:
world_rivers %>%
select(name_en, geometry) %>%
head()## Simple feature collection with 6 features and 1 field
## Geometry type: MULTILINESTRING
## Dimension: XY
## Bounding box: xmin: -91.97004 ymin: 11.63229 xmax: 56.68579 ymax: 60.50012
## Geodetic CRS: WGS 84
## name_en geometry
## 0 Kama MULTILINESTRING ((51.93713 ...
## 1 Kama MULTILINESTRING ((53.69385 ...
## 2 Lesser Abay MULTILINESTRING ((37.11301 ...
## 3 Euphrates MULTILINESTRING ((38.56119 ...
## 4 Alabama MULTILINESTRING ((-86.52177...
## 5 Albany MULTILINESTRING ((-91.31225...
The geometry type is “multilinestring”, rather than just
“linestring”, because some rivers have breaks or forks along their
length, and so must be represented by more than one line. For example,
the “Chico River” in the Philippines (from the world_rivers
dataset we downloaded) has two distinct segments, each of which must be
represented by its own linestring:
As with the polygon and point geometry objects, we can plot this
object directly with ggplot() and
geom_sf():
ggplot(world_rivers) +
geom_sf()Nice!
It is possible (though rarely necessary) to extract and print to console the entire list of coordinates for a linestring or polygon:
## Print the linestring for the first row in `world_rivers`
world_rivers[["geometry"]][[1]] MULTILINESTRING ((22.75649 -20.47161, 22.79458 -20.43499, 22.80981 -20.42571, 22.85693 -20.40462, 22.91909 -20.38948, 22.93091 -20.38372, 22.98672 -20.34564, 23.01089 -20.33577, 23.10303 -20.31097, 23.11929 -20.30423, 23.27441 -20.15775, 23.32788 -20.1242))
In this linestring, the first point has a coordinates 22.75649 and -20.47161, the second point has coordinates 22.79458 -20.43499, and so on.
The tables below shows the “raw” coordinate representations for other geometries:
Points, lines and polygons
The afrilearndata::afriairports contains geographic data
on airports in Africa.
◘ What type of geometry is used to represent each capital?
# Delete the wrong lines and run the correct line:
q_airports_geom_type <- "LINESTRING"
q_airports_geom_type <- "POLYGON"
q_airports_geom_type <- "POINT"◘ Plot the airports with ggplot() and
geom_sf().
q_airports_plot <-
ggplot(data = afrilearndata::afriairports) +
geom_sf()
q_airports_plotThe ne_download() function from {rnaturalearth} can be
used to obtain a map of major world roads, using the code below:
roads <-
ne_download(scale = 10,
category = "physical",
type = "geographic_lines",
returnclass = "sf") ◘ What type of geometry is used to represent the rivers?
# Delete the wrong lines and run the correct line:
q_rivers_geom_type <- "MULTILINESTRING"
q_rivers_geom_type <- "MULTIPOLYGON"
q_rivers_geom_type <- "MULTIPOINT"◘ Plot the rivers with ggplot() and
geom_sf().
q_rivers_plot <-
ggplot(data = rivers) +
geom_sf()
q_rivers_plotGreat work! Hopefully you now understand the three main kinds of geometry and how these can be plotted. Soon you will see how to add additional information to these geometries to create thematic maps.
The third header line gives the “dimension” of the sf
object:
Simple feature collection with 177 features and 2 fields
Geometry type: MULTIPOLYGON
👉Dimension: XY👈
Bounding box: xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
Geodetic CRS: SOURCECRS
This simply refers to the Cartesian coordinate system for this object. An object with two-dimensional features has XY coordinates, while one with three-dimensional features has XYZ coordinates.
In this course chapter we will not deal with any three-dimensional features; so the “XY” dimension is all you will see.
The fourth line of the header provides the coordinates of the
sf object’s bounding box:
Simple feature collection with 177 features and 2 fields
Geometry type: MULTIPOLYGON
Dimension: XY
👉Bounding box: xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513👈
Geodetic CRS: SOURCECRS
The bounding box is the smallest rectangle that can surround all the features in the object. The box is represented by its lower-left and upper-right coordinates, as illustrated in the figure below:
Fig: An example bounding box, for the country of Turkey
For our countries map, the bounding box is written in terms of
longitude and latitude:
xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513.
This is equivalent to:
180°W, 90°E, 180°E, 83.64513°S.
The spData::nz dataset contains mapping information for
the regions of New Zealand. Observe the bounding box for this object.
Does it appear to be in terms of latitude and longitudes?
# Delete the wrong lines and run the correct line:
q_nz_bounding_box <- "A. No, it's not in terms of latitude and longitude."
q_nz_bounding_box <- "B. Yes, it is in terms of latitude and longitude."The final header line tells us what coordinate reference system used.
Simple feature collection with 177 features and 2 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
👉Geodetic CRS: SOURCECRS👈
You can ignore this for now; it will be the focus of a later chapter.
Phew! That was a lot of information. Congratulations on getting through it.
In the process of exploring the components of an sf
header, we have also explained several of the core concepts you will
need to know for your geospatial work. We will refer back to these
concepts throughout this course chapter.
sf
data frameNow that we have covered most of the terms in the sf
header, let’s consider the actual data frame that holds the geospatial
information.
First, we’ll define a new object, countries_select, with
only three columns, that we can manipulate and view easily.
countries_select <-
ne_countries(returnclass = "sf") %>%
select(name, # country name
pop_est) # estimated population
head(countries_select)## Simple feature collection with 6 features and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -73.41544 ymin: -55.25 xmax: 75.15803 ymax: 42.68825
## Geodetic CRS: SOURCECRS
## name pop_est geometry
## 0 Afghanistan 28400000 MULTIPOLYGON (((61.21082 35...
## 1 Angola 12799293 MULTIPOLYGON (((16.32653 -5...
## 2 Albania 3639453 MULTIPOLYGON (((20.59025 41...
## 3 United Arab Emirates 4798491 MULTIPOLYGON (((51.57952 24...
## 4 Argentina 40913584 MULTIPOLYGON (((-65.5 -55.2...
## 5 Armenia 2967004 MULTIPOLYGON (((43.58275 41...
The primary property that makes an sf data frame soecial
is its geometry column. This is the column that holds the core
geospatial data (points, linestrings or polygons). Let’s go through some
noteworthy points about this column.
The geometry column can be renamed
In all our previous examples, the geometry column was literally called “geometry”, but this does not have to be the case. We can rename it to any arbitrary name. For example, below, we rename the “geometry” column to “geographic_info”:
countries_new_col_name <-
countries_select %>%
rename(geographic_info = geometry)
head(countries_new_col_name)## Simple feature collection with 6 features and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -73.41544 ymin: -55.25 xmax: 75.15803 ymax: 42.68825
## Geodetic CRS: SOURCECRS
## name pop_est geographic_info
## 0 Afghanistan 28400000 MULTIPOLYGON (((61.21082 35...
## 1 Angola 12799293 MULTIPOLYGON (((16.32653 -5...
## 2 Albania 3639453 MULTIPOLYGON (((20.59025 41...
## 3 United Arab Emirates 4798491 MULTIPOLYGON (((51.57952 24...
## 4 Argentina 40913584 MULTIPOLYGON (((-65.5 -55.2...
## 5 Armenia 2967004 MULTIPOLYGON (((43.58275 41...
As usual, {ggplot} will recognize and plot this column, despite the new name:
ggplot(countries_new_col_name) +
geom_sf()The geometry column can’t be dropped
The geometry column cannot normally de dropped.
Recall that further above, we ran the following select()
statement that asked for only two columns, “name” and “pop_est”:
countries_select <-
countries %>%
select(name, # country name
pop_est) # estimated populationbut this command returned three columns: “name”, “pop_est” and “geometry”. Why?
Since the “geometry” column holds the core geospatial data for an
sf object, it is always included—it cannot be dropped in
the normal ways.
To drop this column, you have to convert the object to a regular data
frame with as.data.frame() or as_tibble()
first:
countries_tibble <- as_tibble(countries_select)
head(countries_tibble)## # A tibble: 6 × 3
## name pop_est geometry
## <chr> <dbl> <MULTIPOLYGON [°]>
## 1 Afghanistan 28400000 (((61.21082 35.65007, 62.23065 35.27066, 62.984…
## 2 Angola 12799293 (((16.32653 -5.87747, 16.57318 -6.622645, 16.86…
## 3 Albania 3639453 (((20.59025 41.8554, 20.46318 41.51509, 20.6051…
## 4 United Arab Emirates 4798491 (((51.57952 24.2455, 51.75744 24.29407, 51.7943…
## 5 Argentina 40913584 (((-65.5 -55.2, -66.45 -55.25, -66.95992 -54.89…
## 6 Armenia 2967004 (((43.58275 41.09214, 44.97248 41.24813, 45.179…
countries_tibble %>%
select(name, pop_est) %>%
head()## # A tibble: 6 × 2
## name pop_est
## <chr> <dbl>
## 1 Afghanistan 28400000
## 2 Angola 12799293
## 3 Albania 3639453
## 4 United Arab Emirates 4798491
## 5 Argentina 40913584
## 6 Armenia 2967004
(Of course, you usually should not need to drop the geometry column
from your sf objects.
geom_sf() automatically recognizes the geometry
column
So far you have been passing the sf object into
geom_sf() without adding any arguments, for example:
ggplot(data = countries_select) +
geom_sf()This is because geom_sf() automatically recognizes the
geometry column and passes it in as an aesthetic. You can make this more
explicit, by passing the column manually to the geometry
argument:
ggplot(data = countries_select) +
geom_sf(mapping = aes(geometry = geometry))For the countries_new_col_name defined above, in which
we renamed geometry column, the code would look like this:
ggplot(data = countries_new_col_name) +
geom_sf(mapping = aes(geometry = geographic_info))sf data frameApart from the geometry column, sf data frames are like
regular data frames, and can be manipulated as such. For example, we can
filter the data frame to specific rows in order to plot only specific
features in the dataset. Let’s try this now.
Recall that further above, we used the country argument
in ne_countries() to return a subset of the countries
provided by {rnaturalearth}. For example, to return the map of Nigeria
and Niger, we ran:
# Map of Nigeria and Niger
nigeria_niger <- ne_countries(type = "countries", returnclass = "sf",
country = c("nigeria", "niger"))
ggplot(data = nigeria_niger) +
geom_sf()We can now achieve the same result by taking the data frame for all
countries and using dplyr::filter() to filter it down to
just Nigeria and Niger:
nigeria_niger_filtered <-
countries_select %>%
filter(name %in% c("Nigeria", "Niger"))
nigeria_niger_filtered## Simple feature collection with 2 features and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 0.2956464 ymin: 4.240594 xmax: 15.90325 ymax: 23.47167
## Geodetic CRS: SOURCECRS
## name pop_est geometry
## 1 Niger 15306252 MULTIPOLYGON (((2.154474 11...
## 2 Nigeria 149229090 MULTIPOLYGON (((8.500288 4....
As before, we can plot this object with ggplot() and
geom_sf():
ggplot(data = nigeria_niger_filtered) +
geom_sf()Excellent! So you see now that sf objects can be
manipulated like data frames: we can select specific
columns and filter specific rows, for example. Soon we will
see how to perform other data frame operations on sf
objects, like mutating, summarizing and joining.
For now, try out the practice questions below.
◘ From the afrilearndata::afriairports dataset, filter
then plot only the airports in South Africa. Use filter(),
ggplot() and geom_sf()
q_south_africa_airports_plot <-
afrilearndata::afriairports %>%
filter(country_name == "South Africa") %>%
ggplot() +
geom_sf()
q_south_africa_airports_plot◘ From the rivers dataset previously defined, filter
then plot the two features named “Nile”
q_nile <-
rivers %>%
filter(name == "Nile") %>%
ggplot() +
geom_sf()
q_nilesf objects with mapview::mapview()@INCOMPLETE
A nice way to get a quick overview of all the fields and features in a map object is with mapview::mapview()
When you click on any feature, you see a list of fields (attributes):
[SCREENSHOT]
@INCOMPLETE
The following team members contributed to this lesson:
Some material in this lesson was adapted from the following sources:
Batra, Neale, et al. (2021). The Epidemiologist R Handbook. Chapter 28: GIS Basics. (2021). Retrieved 01 April 2022, from https://epirhandbook.com/en/gis-basics.html
Moreno, Mel, and Mathieu Bastille. “Mel Moreno and Mathieu Basille.” Drawing beautiful maps programmatically with R, sf and ggplot2 — Part 1: Basics, October 25, 2018. https://r-spatial.org/r/2018/10/25/ggplot2-sf.html.
Lovelace, R., Nowosad, J., & Muenchow, J. Geocomputation with R. Chapter 2: Geographic data in R. (2019). Retrieved 01 April 2022, from https://geocompr.robinlovelace.net/spatial-class.html
Moraga, Paula. Geospatial Health Data: Modeling and Visualization with R-INLA and Shiny. Chapter 2: Spatial data and R packages for mapping. (2019). Retrieved 01 April 2022, from https://www.paulamoraga.com/book-geospatial/sec-spatialdataandCRS.html
Baumer, Benjamin S., Kaplan, Daniel TRUE., and Horton, Nicholas J. Modern Data Science with R. Chapter 17: Working with geospatial data. (2021). Retrieved 05 June 2022, from https://mdsr-book.github.io/mdsr2e/ch-spatial.html